NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On the Robustness of LDP Protocols for Numerical Attributes under Data Poisoning Attacks

https://doi.org/10.14722/ndss.2025.241521

Li, Xiaoguang; Li, Zitao; Li, Ninghui; Sun, Wenhai (January 2025, Internet Society)

Full Text Available
FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model

https://doi.org/10.1145/3637528.3671897

Wu, Feijie; Li, Zitao; Li, Yaliang; Ding, Bolin; Gao, Jing (August 2024, Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)

Full Text Available
Differentially Private Vertical Federated Clustering

https://doi.org/10.14778/3583140.3583146

Li, Zitao; Wang, Tianhao; Li, Ninghui (February 2023, Proceedings of the VLDB Endowment)

In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes, and a server wants to leverage the data to train a model. To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques, where the data parties share only information for training the model, instead of the private data. However, it is challenging to ensure that the shared information maintains privacy while learning accurate models. To the best of our knowledge, the algorithm proposed in this paper is the first practical solution for differentially private vertical federatedk-means clustering, where the server can obtain a set of global centers with a provable differential privacy guarantee. Our algorithm assumes an untrusted central server that aggregates differentially private local centers and membership encodings from local data parties. It builds a weighted grid as the synopsis of the global dataset based on the received information. Final centers are generated by running anyk-means algorithm on the weighted grid. Our approach for grid weight estimation uses a novel, light-weight, and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin sketch. To improve the estimation accuracy in the setting with more than two data parties, we further propose a refined version of the weights estimation algorithm and a parameter tuning strategy to reduce the finalk-means loss to be close to that in the central private setting. We provide theoretical utility analysis and experimental evaluation results for the cluster centers computed by our algorithm and show that our approach performs better both theoretically and empirically than the two baselines based on existing techniques
more » « less
Full Text Available
MGD: A Utility Metric for Private Data Publication

https://doi.org/10.1145/3491371.3491385

Li, Zitao; Dang, Trung; Wang, Tianhao; Li, Ninghui (December 2021, NSysS 2021: Proceedings of the 8th International Conference on Networking, Systems and Security)

techniques to protect user data privacy. A common way for utilizing private data under DP is to take an input dataset and synthesize a new dataset that preserves features of the input dataset while satisfying DP. A trade-off always exists between the strength of privacy protection and the utility of the final output: stronger privacy protection requires larger randomness, so the outputs usually have a larger variance and can be far from optimal. In this paper, we summarize our proposed metric for the NIST “A Better Meter Stick for Differential Privacy” competition [26], MarGinal Difference (MGD), for measuring the utility of a synthesized dataset. Our metric is based on earth mover distance. We introduce new features in our metric so that it is not affected by some small random noise that is unavoidable in the DP context but focuses more on the significant difference. We show that our metric can reflect the range query error better compared with other existing metrics. We introduce an efficient computation method based on the min-cost flow to alleviate the high computation cost of the earth mover’s distance.
more » « less
Full Text Available
Federated matrix factorization with privacy guarantee

https://doi.org/10.14778/3503585.3503598

Li, Zitao; Ding, Bolin; Zhang, Ce; Li, Ninghui; Zhou, Jingren (December 2021, Proceedings of the VLDB Endowment)

Matrix factorization (MF) approximates unobserved ratings in a rating matrix, whose rows correspond to users and columns correspond to items to be rated, and has been serving as a fundamental building block in recommendation systems. This paper comprehensively studies the problem of matrix factorization in different federated learning (FL) settings, where a set of parties want to cooperate in training but refuse to share data directly. We first propose a generic algorithmic framework for various settings of federated matrix factorization (FMF) and provide a theoretical convergence guarantee. We then systematically characterize privacy-leakage risks in data collection, training, and publishing stages for three different settings and introduce privacy notions to provide end-to-end privacy protections. The first one is vertical federated learning (VFL), where multiple parties have the ratings from the same set of users but on disjoint sets of items. The second one is horizontal federated learning (HFL), where parties have ratings from different sets of users but on the same set of items. The third setting is local federated learning (LFL), where the ratings of the users are only stored on their local devices. We introduce adapted versions of FMF with the privacy notions guaranteed in the three settings. In particular, a new private learning technique called embedding clipping is introduced and used in all the three settings to ensure differential privacy. For the LFL setting, we combine differential privacy with secure aggregation to protect the communication between user devices and the server with a strength similar to the local differential privacy model, but much better accuracy. We perform experiments to demonstrate the effectiveness of our approaches.
more » « less
Full Text Available
Estimating Numerical Distributions under Local Differential Privacy

https://doi.org/10.1145/3318464.3389700

Li, Zitao; Wang, Tianhao; Lopuhaä-Zwakenberg, Milan; Li, Ninghui; Škoric, Boris (June 2020, SIGMOD '20: Proceedings of the 2020 International Conference on Management of Data)

When collecting information, local differential privacy (LDP) relieves the concern of privacy leakage from users' perspective, as user's private information is randomized before sent to the aggregator. We study the problem of recovering the distribution over a numerical domain while satisfying LDP. While one can discretize a numerical domain and then apply the protocols developed for categorical domains, we show that taking advantage of the numerical nature of the domain results in better trade-off of privacy and utility. We introduce a new reporting mechanism, called the square wave (SW) mechanism, which exploits the numerical nature in reporting. We also develop an Expectation Maximization with Smoothing (EMS) algorithm, which is applied to aggregated histograms from the SW mechanism to estimate the original distributions. Extensive experiments demonstrate that our proposed approach, SW with EMS, consistently outperforms other methods in a variety of utility metrics.
more » « less
Full Text Available
Locally Differentially Private Frequency Estimation with Consistency

https://doi.org/10.14722/ndss.2020.24157

Wang, Tianhao; Lopuhaa-Zwakenberg, Milan; Li, Zitao; Skoric, Boris; Li, Ninghui (February 2020, NDSS'20: Proceedings of the NDSS Symposium)

Local Differential Privacy (LDP) protects user privacy from the data collector. LDP protocols have been increasingly deployed in the industry. A basic building block is frequency oracle (FO) protocols, which estimate frequencies of values. While several FO protocols have been proposed, the design goal does not lead to optimal results for answering many queries. In this paper, we show that adding post-processing steps to FO protocols by exploiting the knowledge that all individual frequencies should be non-negative and they sum up to one can lead to significantly better accuracy for a wide range of tasks, including frequencies of individual values, frequencies of the most frequent values, and frequencies of subsets of values. We consider 10 different methods that exploit this knowledge differently. We establish theoretical relationships between some of them and conducted extensive experimental evaluations to understand which methods should be used for different query tasks.
more » « less
Full Text Available

Search for: All records